Why, in Deep Learning, Non-smooth Activation Function Works Better Than Smooth Ones

نویسندگان

چکیده

Since in the physical world, most dependencies are smooth (differentiable), traditionally, functions were used to approximate these dependencies. In particular, neural networks activation such as sigmoid function. However, successes of deep learning showed that many cases, non-smooth like $$\max (0,z)$$ work much better. this paper, we explain why approximating often better—even when approximated dependence is smooth.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why Bigger Windows Are Better Than Smaller Ones

We investigate the use of multi-term query concepts to improve the performance of text-retrieval systems that accept \natural-language" queries. A relevance feedback process is explained that massively expands an initial query with single and multi-term concepts. The multi-term concepts are modelled as a set of words appearing within windows of varying sizes. Experimental results suggest that w...

متن کامل

Why & When Deep Learning Works: Looking Inside Deep Learnings

In recent years, Deep Learning has emerged as the leading technology for accomplishing broad range of artificial intelligence tasks (LeCun et al. (2015); Goodfellow et al. (2016)). Deep learning is the state-of-the-art approach across many domains, including object recognition and identification, text understating and translation, question answering, and more. In addition, it is expected to pla...

متن کامل

Smooth biproximity spaces and P-smooth quasi-proximity spaces

The notion of smooth biproximity space  where $delta_1,delta_2$ are gradation proximities defined by Ghanim et al. [10]. In this paper, we show every smooth biproximity space $(X,delta_1,delta_2)$ induces a supra smooth proximity space $delta_{12}$ finer than $delta_1$ and $delta_2$. We study the relationship between $(X,delta_{12})$ and the $FP^*$-separation axioms which had been introduced by...

متن کامل

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an ̃ O(H p SAT ) Bayesian regret bound for PSRL in finite-horizon episodic Markov decision ...

متن کامل

Ten good reasons why structured graphs can be better than flat ones

This talk presents our proposal, called ADR, for the design of reconfigurable software systems. ADR is based on hierarchical graphs with interfaces and it has been conceived in the attempt of reconciling software architectures and process calculi by means of graphical methods. We illustrate the main motivations behind ADR and the current advancements on its foundations, applications and tool su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Studies in systems, decision and control

سال: 2023

ISSN: ['2198-4182', '2198-4190']

DOI: https://doi.org/10.1007/978-3-031-16415-6_16